Abstract
We introduce the fundamental concepts, typical approaches to experimental design and measurement, and parametric model estimation via MLE in Behavioral Decision Theory (BDT). In the following 1.5 hours of this tech tutorial in the scope of DSC 2022 we will consider only choice under risk (and not uncertainty), and study two important models. We will review and provide tools in R for the MLE estimation of the (normative) Expected Utility Theory (EUT, von Neumann–Morgenstern) and the (descriptive) Cumulative Prospect Theory (Tversky-Kahneman). We approach the model selection in BDT via k-fold cross-validation, more characteristic of the ML community and rarely used in empirical approaches in microeconomics, behavioral economics, and cognitive psychology. However, typical, ordinary approaches to k-fold CV fail because of the deep dependence between the experimental design (i.e. the structure of the data set) and model estimation procedures. Understanding the fundaments of BDT and being able to estimate parametric models of choice under risk is a prerequisite for the design of any behavioral intervention, an aproach that gained so much popularity and attracted so much attention following the recognition of the central importance of behavioral economics (e.g Kahneman’s (2002) and Thaler’s (2017) Nobel Memorial Prize in Economic Sciences) to the study of human decision making. Beyond, characterizations of individual decision makers by estimated BTD model parameters provide new sets of features for any further risk modeling.
Feedback should be send to goran.milovanovic@datakolektiv.com. These notebook accompanies the BEHAVIORAL DECISION THEORY (BDT) EXPERIMENTAL DESIGN, MODEL ESTIMATION AND SELECTION Data Science Conference 2022 Tech Tutorial, 15. November 2022, 16:30 - 18:00 CET
# - setup
data_dir <- paste0(getwd(), "/_data/")
meq_data_dir <- paste0(data_dir, "_meq/")
kt92_data_dir <- paste0(data_dir, "_kt92/")
# - libs
library(tidyverse)
library(scales)
library(plotly)
library(snowfall)
The risky prospect and the sure thing have the same expected value:
\[E[Risky Prospect] = E[X=100, p=.5;X=0, p=.5] = .5\times100 + .5\times0 = 50\]
\[E[Sure Thing] = 50\]
Why are people risk averse?
Daniel Bernoulli, “Specimen Theoriae Novae de Mensura Sortis”, Comentarii Academiae Scientiarum Imperialis Petroolitanae, Tomus V, 1738, pp. 175-192
Risk aversion follows as a consequences of the characteristics of the decision maker’s utility function.
The power utility function:
\[u(x) = x^\rho\] The decision maker is risk averse for \(0 < \rho < 1\) and risk seeking for \(\rho>1\); risk seeking is not seen as a rational risk attitude.
# - power utility function over gains
x <- seq(1, 100, .1)
rho = .67
ux <- x^rho
ra_frame <- data.frame(value = x,
utility = ux)
ggplot(data = ra_frame,
aes(x = value, y = utility)) +
geom_point(size = .25, color = "blue") +
geom_segment(aes(x = 50,
y = 0,
xend = 50,
yend = 50^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = 50^rho,
xend = 50,
yend = 50^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 100,
y = 0,
xend = 100,
yend = 100^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = 100^rho,
xend = 100,
yend = 100^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = 100^rho,
xend = 100,
yend = 100^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = (100^rho)/2,
xend = 100,
yend = (100^rho)/2),
colour = "red",
linetype = "dashed",
size = .1) +
ggtitle("Power utility function, rho=.67, Risk Aversion") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
# - power utility function over gains
x <- seq(1, 100, .1)
rho = 1.2
ux <- x^rho
ra_frame <- data.frame(value = x,
utility = ux)
ggplot(data = ra_frame,
aes(x = value, y = utility)) +
geom_point(size = .25, color = "red") +
geom_segment(aes(x = 50,
y = 0,
xend = 50,
yend = 50^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = 50^rho,
xend = 50,
yend = 50^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 100,
y = 0,
xend = 100,
yend = 100^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = 100^rho,
xend = 100,
yend = 100^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = 100^rho,
xend = 100,
yend = 100^rho),
colour = "black",
linetype = "dashed",
size = .1) +
geom_segment(aes(x = 0,
y = (100^rho)/2,
xend = 100,
yend = (100^rho)/2),
colour = "red",
linetype = "dashed",
size = .1) +
ggtitle("Power utility function, rho=1.2, Risk Seeking") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
Daniel Bernoulli, 1738. People make choices under risk by following the principle of the Maximum Expected Utility (MEU):
\[EU(Prospect) = \sum_{i=1}^{N}p_iu(x_i)\]
The approach was axiomatized in 1947. only by John von Neumann and Oskar Morgenstern in their “Theory of Games and Economic Behavior”, by providing a set of axioms that introduce the preference relation and a proof of a representation theorem showing that any decision maker who obeys the axioms of rational choice is at the same time a EU maximizer (and vice versa). Since then: the von Neumann-Morgenstern utility (or vNM Expected Utility Theory).
People are risk seeking the domain of losses.
Example. The “Asian disease” framing problem (Tversky & Kahneman, 1981):
Imagine that the US is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alternative programs to combat the disease have been proposed.
This finding illustrates risk aversion in the domain of gains.
This finding illustrates risk seeking in the domain of losses.
The Reference-Dependent Utility Function:
# - power reference-dependent utility function over gains and losses
x <- seq(-100, 100, .1)
rho = .67
ux <- ifelse(x>=0, x^rho, -(-x)^rho)
ra_frame <- data.frame(value = x,
utility = ux)
ggplot(data = ra_frame,
aes(x = value, y = utility)) +
geom_point(size = .25, color = "blue") +
geom_hline(yintercept=0, size = .1, colour = "black", linetype = "dashed") +
geom_vline(xintercept=0, size = .1, colour = "black", linetype = "dashed") +
ggtitle("Reference-Dependent Power Utility function, rho=.67") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
Kahneman, Knetch & Thaler experiment, 1991.
Half of the participants in the experimental study receive something for free at the beginning of the session, e.g. a beatiful cup
For those who own the item: place the offer.
$7 and 12 cents
2$ and 87 cents
That would be a 2.48 ratio in favor of the offered priced.
We need another correction of the utility function. Enter Loss Aversion, \(\lambda\):
# - power reference-dependent utility function over gains and losses
x <- seq(-100, 100, .1)
rho = .67
lambda = 2.2
ux <- ifelse(x>=0, x^rho, -lambda*(-x)^rho)
ra_frame <- data.frame(value = x,
utility = ux)
ggplot(data = ra_frame,
aes(x = value, y = utility)) +
geom_point(size = .25, color = "blue") +
geom_hline(yintercept=0, size = .1, colour = "black", linetype = "dashed") +
geom_vline(xintercept=0, size = .1, colour = "black", linetype = "dashed") +
ggtitle("Loss Averse Reference-Dependent Power Utility function, \nrho=.67, lambda=2.2") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
If \(\lambda > 1\), we observe loss aversion; however, if \(0<\lambda<1\), we observe gain seeking, and that happens in empirical settings more often than people think.
But there is more to it.
The four-fold pattern of risk attitudes (Kahneman & Tversky), e.g.
and of course we observe the reflection effect in the domain of losses:
It is like if there was a probability weighting function, w(p) such that \(w(p) > p\) for small \(p\) and \(w(p) < p\) for high \(p\), with an inflection point somewhere. Many functional forms were proposed for \(w(p)\); we will mention only the Prelec’s one-paramater form (1988):
\[w(p)=exp(-(-ln(p))^\gamma)), 0<\gamma<1 \]
# - Prelec's single-parameter probability weighting function
p_weight <- function(p, gamma) {
wp <- exp(-(-log(p))^gamma)
return(wp)
}
p <- seq(0,1,.01)
gamma <- .65
wp <- p_weight(p, gamma)
wp_frame <- data.frame(probability = p,
w_p = wp)
ggplot(data = wp_frame,
aes(x = probability, y = wp)) +
geom_path(size = .25, color = "blue", group=1) +
ggtitle("Prelec's one-parameter w(p), gamma=.65") +
geom_segment(aes(x = 0,
y = 0,
xend = 1,
yend = 1),
colour = "black",
linetype = "dashed",
size = .1) +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
Finally,
\[u_g(x)=x^{\rho_g}\]
\[u_l(x)=-{\lambda}x^{\rho_l}\]
\[w_g(p)=exp(-(-ln(p_g))^{\gamma_g}))\] \[w_l(p)=exp(-(-ln(p_l))^{\gamma_l}))\]
And then in the simplest case we have
\[PT_G(Prospect) = \sum_{i=1}^{N}w_g(p_i)u_g(x_i)\]
for gains and
\[PT_L(Prospect) = \sum_{i=1}^{N}w_l(p_i)u_l(x_i)\] for losses. For mixed prospects (i.e. lotteries including both gains and losses), we sum up the \(PT()\) for the positive and the negative part:
\[PT_{MIXED}(Prospect) = PT_G(Prospect)+PT_L(Prospect)\]
Inspect some Monetary Equivalents (MEq) data (Kahneman & Tversky, 1992):
# - load some MEq experimental data
meq_data <- read.csv(paste0(kt92_data_dir, "kt92.csv"),
header = TRUE,
check.names = FALSE,
row.names = 1,
stringsAsFactors = FALSE)
# - percents to probability in meq_data
meq_data$p1 <- meq_data$p1/100
meq_data$p2 <- meq_data$p2/100
print(meq_data)
Start simple: let’s estimate the EUT parameter (there is only one: the power-utility \(\rho\) exponent) via Least Squares.
utility_power() is our EUT utility function.
# - power utility function
utility_power <- function(x, rho) {
u <- x^rho
return(u)
}
A function eut_predict() to predict monetary equivalents from a dataset via EUT:
# - function: EUT predictions
eut_predict <- function(data, rho) {
# - utility functions
u1 <- ifelse(data$v1 >= 0,
utility_power(data$v1, rho),
-utility_power(-data$v1, rho)
)
u2 <- ifelse(data$v2 >= 0,
utility_power(data$v2, rho),
-utility_power(-data$v2, rho)
)
# - predictions: utility scale
predictions <- data$p1*u1 + data$p2*u2
# - predictions: value scale
predictions <- ifelse(predictions >= 0,
predictions^(1/rho),
-((-predictions)^(1/rho)))
# - output
return(predictions)
}
NOTE. The predictions: value scale part of the code is very important, because we do not wish to make predictions on the utility scale but rather on the value (MEq) scale. So, let’s not forget that
\[MEq = u(x)^{1/\rho}\] for gains, and
\[MEq = -(u(-x)^{1/\rho})\] for losses.
And test our predictions for \(\rho=.76\):
preds <- eut_predict(data = meq_data, rho = .76)
print(preds)
[1] 2.4164651 20.0852807 43.5274054 -2.4164651 -20.0852807 -43.5274054
[7] 1.9414152 16.1367400 40.1705613 68.4868080 93.4735968 -1.9414152
[13] -16.1367400 -40.1705613 -68.4868080 -93.4735968 0.4671443 9.6658605
[19] 80.3411227 174.1096214 197.3725857 -0.4671443 -9.6658605 -80.3411227
[25] -174.1096214 -197.3725857 0.9342886 394.7451714 -0.9342886 -394.7451714
[31] 54.6116284 73.9745732 94.6469643 -54.6116284 -73.9745732 -94.6469643
[37] 54.3353025 72.4912535 96.8153714 122.7129797 144.4385997 -54.3353025
[43] -72.4912535 -96.8153714 -122.7129797 -144.4385997 104.5872226 123.4136252
[49] 147.9491464 173.5050929 194.6292955 -104.5872226 -123.4136252 -147.9491464
[55] -173.5050929 -194.6292955
Now the objective function eut_sse(): it computes the \(SSE\) following the predictive pass of eut_predict()
# - optimize the EUT model via LS
eut_sse <- function(params) {
rho <- abs(params[1])
preds <- eut_predict(meq_data, rho)
sse = sum((meq_data$meq - preds)^2)
return(sse)
}
Test for \(\rho=.76\):
eut_sse(params=.76)
[1] 7535.473
And we want to minimize eut_sse() for the MEq dataset at hand. We will use Nelder-Mead with R optim():
# - random initial value
init_rho <- runif(1, 0, 1)
solution <- optim(par = init_rho,
fn = eut_sse,
method = "Nelder-Mead",
control = list("maxit" = 10e6))
print(paste0("Estimated Rho of ", round(solution$par, 2)))
[1] "Estimated Rho of 0.7"
print(paste0("Convergence check: ", solution$convergence))
[1] "Convergence check: 0"
Now let’s predict the observed MEqs from the estimated parameter:
# - predict from the optimized EUT model
meq_data$eut_meq <- eut_predict(meq_data,
rho = solution$par)
print(meq_data)
And plot the predictions:
# - plot predictions
ggplot(data = meq_data,
aes(x = eut_meq, y = meq)) +
geom_smooth(method = "lm", size = .25, color = "red", se = FALSE) +
geom_point(size = 1.5, color = "black") +
geom_point(size = 1, color = "white") +
ggtitle(paste0("EUT, Estimated Rho of ", round(solution$par, 2))) +
xlab("Predicted MEq") +
ylab("Observed MEq") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
Let’s run 100 optimizations to check if we are stuck in some local minimum only:
# - many optimization runs
init_rho <- runif(100, 0, 1)
# - run
results <- lapply(init_rho, function(x) {
# - optimization run
optim_sol <- tryCatch(optim(par = x,
fn = eut_sse,
method = "Nelder-Mead",
control = list("maxit" = 10e6)),
error = function(condition) {
return(NULL)
})
if (!is.null(optim_sol)) {
# - predictions
preds <- eut_predict(meq_data, rho = optim_sol$par)
return(
list(rho = optim_sol$par,
sse = optim_sol$value,
convergence = optim_sol$convergence,
predictions = preds)
)
} else {
return(
list(rho = NULL,
sse = NULL,
convergence = NULL,
predictions = NULL)
)
}
})
# - find best solution
sse_list <- sapply(results, function(x) x$sse)
best_run <- which.min(sse_list)[1]
print(paste0("Check convergence: ", results[[best_run]]$convergence))
[1] "Check convergence: 0"
estimated_rho <- results[[best_run]]$rho
optimal_predictions <- results[[best_run]]$predictions
# - plot predictions
plot_frame <- meq_data
meq_data$eut_meq <- optimal_predictions
ggplot(data = meq_data,
aes(x = meq, y = eut_meq)) +
geom_smooth(method = "lm", size = .25, color = "red", se = FALSE) +
geom_point(size = 1.5, color = "black") +
geom_point(size = 1, color = "white") +
ggtitle(paste0("EUT, Estimated Rho of ", round(estimated_rho, 2))) +
xlab("Predicted MEq") +
ylab("Observed MEq") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
The EUT model error surface from a sample of 100,000 random parameter values:
# - sample parameters from a range for plotting purposes
sample_parameters <- data.frame(rho = runif(100000, 0, 1))
sample_parameters$sse <- sapply(sample_parameters$rho, eut_sse)
head(sample_parameters)
ggplot(data = sample_parameters,
aes(x = rho, y = sse)) +
geom_line(size = .25, color = "black", group=1) +
ggtitle("EUT Model Error Surface ") +
xlab("Rho") +
ylab("SSE") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
For the Maximum-Likelihood Estimation (MLE) of EUT, let’s assume the following:
Then we can obtain the likelihood of each MEq observation from \(Normal(response-prediction, \sigma)\), estimating one additional response model parameter \(\sigma\).
The function eut_likelihood returns the negative log-likelihood for a dataset given parameters \(\rho,\sigma\):
# - function: EUT model likelihood
eut_likelihood <- function(data, rho, sigma) {
# - utility functions
u1 <- ifelse(data$v1 >= 0,
utility_power(data$v1, rho),
-utility_power(-data$v1, rho)
)
u2 <- ifelse(data$v2 >= 0,
utility_power(data$v2, rho),
-utility_power(-data$v2, rho)
)
# - predictions: utility scale
predictions <- data$p1*u1 + data$p2*u2
# - predictions: value scale
predictions <- ifelse(predictions >= 0,
predictions^(1/rho),
-((-predictions)^(1/rho)))
# - likelihood
# - assume that the decision maker computes
# - the utility of the gamble and its MEQ, and then
# - responds with an added random noise of Normal(0, sigma)
neg_loglike <- -sum(
dnorm(predictions - data$meq, mean = 0, sd = sigma, log = TRUE)
)
# - output
return(neg_loglike)
}
Test eut_likelihood for \(\rho=.76\) and \(\sigma=11\):
nll <- eut_likelihood(data = meq_data, rho = .76, sigma = 11)
print(nll)
[1] 216.881
Introduce the objective eut_mle() function
# - optimize the EUT model via LS
eut_mle <- function(params) {
rho <- abs(params[1])
sigma <- abs(params[2])
nll <- eut_likelihood(data = meq_data, rho, sigma)
return(nll)
}
And optimize eut_mle() for meq_data:
# - random initial value
init_rho <- runif(1, 0, 1)
init_sigma <- runif(1, 0, 1)
solution <- optim(par = c(init_rho, init_sigma),
fn = eut_mle,
method = "Nelder-Mead",
control = list("maxit" = 10e6))
print(paste0("Estimated Rho of ", round(abs(solution$par[1]), 2)))
[1] "Estimated Rho of 0.7"
print(paste0("Estimated Sigma of ", round(abs(solution$par[2]), 2)))
[1] "Estimated Sigma of 11.53"
print(paste0("Convergence check: ", solution$convergence))
[1] "Convergence check: 0"
# - predict from the optimized EUT model
meq_data$eut_meq <- eut_predict(meq_data,
rho = abs(solution$par[1]))
# - plot predictions
ggplot(data = meq_data,
aes(x = meq, y = eut_meq)) +
geom_smooth(method = "lm", size = .25, color = "red", se = FALSE) +
geom_point(size = 1.5, color = "black") +
geom_point(size = 1, color = "white") +
ggtitle(paste0("EUT, Estimated Rho of ", round(abs(solution$par), 2))) +
xlab("Predicted MEq") +
ylab("Observed MEq") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
Check by 100 optimization runs:
# - many optimization runs
init_pars <- lapply(1:100, function(x) {
return(
list(
init_rho = runif(1, 0, 1),
init_sigma = runif(1, 0, 1)
)
)
})
# - run
results <- lapply(init_pars, function(x) {
init_rho <- x$init_rho
init_sigma <- x$init_sigma
# - optimization run
optim_sol <- tryCatch(optim(par = c(init_rho, init_sigma),
fn = eut_mle,
method = "Nelder-Mead",
control = list("maxit" = 10e6)),
error = function(condition) {
return(NULL)
})
if (!is.null(optim_sol)) {
# - predictions
preds <- eut_predict(meq_data, rho = abs(optim_sol$par[1]))
return(
list(rho = optim_sol$par[1],
sigma = optim_sol$par[2],
nll = optim_sol$value,
convergence = optim_sol$convergence,
predictions = preds)
)
} else {
return(
list(rho = NULL,
sigma = NULL,
nll = NULL,
convergence = NULL,
predictions = NULL)
)
}
})
# - find best solution
nll_list <- sapply(results, function(x) x$nll)
best_run <- which.min(nll_list)[1]
print(paste0("Check convergence: ", results[[best_run]]$convergence))
[1] "Check convergence: 0"
estimated_rho <- abs(results[[best_run]]$rho)
print(paste0("Estimated Rho: ", estimated_rho))
[1] "Estimated Rho: 0.699160388762866"
estimated_sigma <- abs(results[[best_run]]$sigma)
print(paste0("Estimated Sigma: ", estimated_sigma))
[1] "Estimated Sigma: 11.5171137364202"
optimal_predictions <- results[[best_run]]$predictions
# - plot predictions
plot_frame <- meq_data
meq_data$eut_meq <- optimal_predictions
ggplot(data = meq_data,
aes(x = meq, y = eut_meq)) +
geom_smooth(method = "lm", size = .25, color = "red", se = FALSE) +
geom_point(size = 1.5, color = "black") +
geom_point(size = 1, color = "white") +
ggtitle(paste0("EUT, Estimated Rho of ", round(estimated_rho, 2))) +
xlab("Predicted MEq") +
ylab("Observed MEq") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust=.5, size = 10))
# - sample parameters from a range for plotting purposes
sample_parameters <- data.frame(rho = runif(100000, 0, 1.5),
sigma = runif(100000, 0, 15))
sample_parameters$nll <- apply(sample_parameters, 1, function(x) {
eut_mle(params = c(x[1], x[2]))
})
head(sample_parameters)
plot_ly() %>%
add_trace(data = sample_parameters,
x = sample_parameters$rho,
y = sample_parameters$sigma,
z = sample_parameters$nll,
type = "mesh3d",
intensity = sample_parameters$nll) %>%
layout(
modebar = list(orientation = "v"),
title = "The EUT Negative Log-Likehood function",
scene = list(
xaxis = list(title = "Rho", range = c(0, 1.5)),
yaxis = list(title = "Sigma", range = c(0, 15)),
zaxis = list(title = "Neg-LogLike", range = c(0, 1000))
)) %>% hide_colorbar()